Windows-1252
, UTF-8
, Latin-1
, or EBCDIC
. What's
a poem to me is terminal garbage to you.
Over the years, hacks have evolved. We have
magic numbers,
and plain ole' hacks to just guess based on the content. Of course, like
all good computer programs, this has lead to its fair share of hilarious
bugs,
and there's nothing stopping files from (validly!) being multiple things at the
same time.
Like many things, it's all in the eye of the beholder.
Timezones
Just like Unicode, this is a word that can put your friendly neighborhood
programmer into a series of profanity laden tirades. Go find one in the wild,
and ask them about what they think about timezone handling bugs they've seen.
I'll wait. Go ahead.
Rants are funny things. They're fun to watch. Hilarious to give. Sometimes
just getting it all out can help. They can tell you a lot about the true
nature of problems.
It's funny to consider the isomorphic nature of Unicode rants and Timezone
rants.
I don't think this is an accident.
U n i c o d e timezone Sandwich
Ned's Unicode Sandwich applies -- As early as we can, in the lowest level
we can (reading from the database, filesystem, wherever!), all datetimes
must be timezone qualified with their correct timezone. Always. If you mean
UTC, say it's in UTC.
Treat any unqualified datetimes as "bytes". They're not to be trusted.
Never, never, never trust 'em. Don't
process any datetimes until you're sure they're in the right timezone.
This lets the delicious inside of your datetime sandwich handle timezones
with grace, and finally, as late as you can, turn it back into bytes
(if at all!). Treat locations as tzdb
entries, and qualify datetime
objects into their absolute timezone (EST
, EDT
, PST
, PDT
)
It's not until you want to show the datetime to the user again should you
consider how to re-encode your datetime to bytes. You should think about
what flavor of bytes, what encoding -- what timezone -- should I be
encoding into?
TEST
Just like Unicode, testing that your code works with datetimes is important.
Every time I think about how to go about doing this, I think about that
one time that mjg59 couldn't book a flight
starting Tuesday from AKL, landing in HNL on Monday night, because
United couldn't book the last leg to SFO. Do you ever assume dates only go
forward as time goes on? Remember timezones.
Construct test data, make sure someone in New Zealand's
+13:45 can correctly talk with
their friends in
Baker Island's -12:00,
and that the events sort right.
Just because it's Noon on New Years Eve in England doesn't mean it's not
1 AM the next year in New Zealand. Places a few miles apart may go on Daylight
savings different days. Indian Standard Time is not even aligned on the hour
to GMT (+05:30
)!
Test early, and test often. Memorize a few timezones, and challenge
your assumptions when writing code that has to do with time. Don't use
wall clocks to mean monotonic time. Remember there's a whole world out there,
and we only deal with part of it.
It's also worth remembering, as Andrew Pendleton
pointed out to me, that it's possible that a datetime isn't even unique for a
place, since you can never know if 2016-11-06 01:00:00
in America/New_York
(in the tzdb
) is the first one, or second one. Storing EST
or EDT
along
with your datetime may help, though!
Pitfalls
Improper handling of timezones can lead to some interesting things, and failing
to be explicit (or at least, very rigid) in what you expect will lead to an
unholy class of bugs we've all come to hate. At best, you have confused
users doing math, at worst, someone misses a critical event, or our
security code fails.
I recently found what I regard to be a pretty bad
bug in apt (which David has prepared a
fix
for and is pending upload, yay! Thank you!), which boiled down to documentation
and code expecting datetimes in a timezone, but accepting any timezone, and
silently treating it as UTC
.
The solution is to hard-fail, which is an interesting choice to me (as a vocal
fan of timezone aware code), but at the least it won't fail by
misunderstanding what the server is trying to communicate, and I do understand
and empathize with the situation the apt
maintainers are in.
Final Thoughts
Overall, my main point is although most modern developers know how to deal
with Unicode pain, I think there is a more general lesson to learn -- namely,
you should always know what data you have, and always remember what it is.
Understand assumptions as early as you can, and always store them with the data.
This states that audio/x-rosegarden is a kind of application/x-gzip (it is a gzipped XML file). Note, it is much better to use an official MIME type registered with IANA than it is to make up ones own unofficial ones like the x-rosegarden type used by rosegarden. The desktop file of the rosegarden program failed to list audio/x-rosegarden in its list of supported MIME types, causing the file browsers to have no idea what to do with *.rg files:<?xml version="1.0" encoding="UTF-8"?> <mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info"> <mime-type type="audio/x-rosegarden"> <sub-class-of type="application/x-gzip"/> <comment>Rosegarden project file</comment> <glob pattern="*.rg"/> </mime-type> </mime-info>
The fix was to add "audio/x-rosegarden;" at the end of the MimeType= line. If you run into a file which fail to open the correct program when selected from the file browser, please check out the output from file --mime-type for the file, ensure the file ending and MIME type is registered somewhere under /usr/share/mime/ and check that some desktop file under /usr/share/applications/ is claiming support for this MIME type. If not, please report a bug to have it fixed. :)% grep Mime /usr/share/applications/rosegarden.desktop MimeType=audio/x-rosegarden-composition;audio/x-rosegarden-device;audio/x-rosegarden-project;audio/x-rosegarden-template;audio/midi; X-KDE-NativeMimeType=audio/x-rosegarden-composition %
there are currently at least 3 ways to refer to a gpg key: short key ID (last 8 hex digits of fingerprint), long key ID (last 16 hex digits) and full fingerprint. The short key ID used to be popular, and since 5 years it is known that it is computationally easy to generate a gnupg key with an arbitrary short key id. A mitigation to this is using "keyid-format long" in gpg.conf, and a better thing to do, especially in scripts, is to use the full fingerprint to refer to a key, or just ship the public key for verification and skip the key servers. Note that in case of keyid collision, gpg will download and import all the matching keys, and will use all the matching keys for verifying signatures.So... What is this about? We humans are quite bad at recognizing and remembering randomly-generated strings with no inherent patterns in them. Every GPG key can be uniquely identified by its fingerprint, a 128-bit string, usually encoded as ten blocks of four hexadecimal characters (this allows for 160 bits; I guess there's space for a checksum in it). That is, my (full) key's signature is:
AB41 C1C6 8AFD 668C A045 EBF8 673A 03E4 C1DB 921FHowever, it's quite hard to recognize such a long string, let alone memorize it! So, we often do what humans do: Given that strong cryptography implies a homogenous probability distribution, people compromised on using just a portion of the key the last portion. The short key ID. Mine is then the last two blocks (shown in boldface): C1DB921F. We can also use what's known as the long key ID, that's twice as long: 64 bits. However, while I can speak my short key ID on a single breath (and maybe even expect you to remember and note it down), try doing so with the long one (shown in italics above): 673A03E4C1DB921F. Nah. Too much for our little, analog brains. This short and almost-rememberable number has then 32 bits of entropy I have less than one in 4,000,000,000 chance of generating a new key with this same short key ID. Besides, key generation is a CPU-intensive operation, so it's quite unlikely we will have a collision, right? Well, wrong. Previous successful attacks on short key IDs Already five years ago, Asheesh Laroia migrated his 1024D key to a 4096R. And, as he describes in his always-entertaining fashion, he made his computer sweat until he was able to create a new key for which the short key ID collided with the old one. It might not seem like a big deal, as he did this non-maliciously, but this easily should have spelt game over for the usage of short key IDs. After all, being able to generate a collision is usually the end for cryptographic systems. Asheesh specifically mentioned in his posting how this could be abused. But we didn't listen. Short key IDs are just too convenient! Besides, they allow us to have fun, can be a means of expression! I know of at least two keys that would qualify as vanity: Obey Arthur Liu's 0x29C0FFEE (created in 2009) and Keith Packard's 0x00000011 (created in 2012). Then we got the Evil 32 project. They developed Scallion, started (AFAICT) in 2012. Scallion automates the search for a 32-bit collision using GPUs; they claim that it takes only four seconds to find a collision. So, they went through the strong set of the public PGP Web of Trust, and created a (32-bit-)colliding key for each of the existing keys. And what happened now? What happened today? We still don't really know, but it seems we found a first potentially malicious collision that is, the first "nonacademic" case. Enrico found two keys sharing the 9F6C6333 short ID, apparently belonging to the same person (as would be the case of Asheesh, mentioned above). After contacting Gustavo, though, he does not know about the second That is, it can be clearly regarded as an impersonation attempt. Besides, what gave away this attempt are the signatures it has: Both keys are signed by what appears to be the same three keys: B29B232A, F2C850CA and 789038F2. Those three keys are not (yet?) uploaded to the keyservers, though... But we can expect them to appear at any point in the future. We don't know who is behind this, or what his purpose is. We just know this looks very evil. Now, don't panic: Gustavo's key is safe. Same for his certifiers, Marga, Agust n and Maxy. It's just a 32-bit collision. So, in principle, the only parties that could be cheated to trust the attacker are humans, right? Nope. Enrico tested on the PGP pathfinder & key statistics service, a keyserver that finds trust paths between any two arbitrary keys in the strong set. Surprise: The pathfinder works on the short key IDs, even when supplied full fingerprints. So, it turns out I have three faked trust paths into our impostor. What next? There are several things this should urge us to do.
I farm bits and pieces out to the guys who are much more brilliant than I am. I say, "build me a laser", this. "Design me a molecular analyzer", that. They do, and I just stick 'em together. (Seth Brundle, "The Fly")When I decided to try and turn siterefactor into staticsite, I decided that I would go ahead only for as long as it could be done with minimal work, writing code in the most straightforward way on top of existing and stable components. I am pleased by how far that went. Python-Markdown It works fast enough, already comes with extensions for most of what I needed, and can be extended in several ways. One of the extension methods is a hook for manipulating the ElementTree of the rendered document before serializing it to HTML, which made it really easy to go and process internal links in all
<a href=
and <img src=
attributes.
To tell an internal link from an external link I just use the standard python
urlparse and see if the
link has a scheme
or a netloc
component. If it does not, and if it has a
path
, then it is an internal link.
This also means that I do not need to invent new Markdown syntax for internal
references, avoiding the need for remembering things like
[text]( < relref "blog/post.md" > )
or [text]( filename /blog/post.md)
.
In staticsite
, it's just [text](/blog/post.md)
or [text](post.md)
if the
post is nearby.
This feels nicely clean to me: if I wanted to implement fancy markdown
features, I could do it as
Python-Markdown extensions and submit
them upstream. If I wanted to implement fancy interlinking features, I could do
it with a special url scheme in links.
For example, it would be straigtforward to implement a ssite:
url scheme that
expanded the url with elements from staticsite
's settings using a call to
python's string.format
(ssite: SETTING_NAME /bar
maybe?), except I do not
currently see any use cases for extending internal linking from what it is now.
Jinja2
Jina2 is a template engine that I already knew, it
is widely used, powerful and pleasant to use, both on the templating side and
on the API's side.
It is not HTML specific, so I can also use it to generate
Atom,
RSS2,
"dynamic" site content,
and even new site Markdown pages.
Implementing RSS and Atom feeds was just a matter of writing and testing
these Jinja2 macros
and then reusing them anywhere.
toml, yaml, json
No need to implement my own front matter parsing. Also, reusing the same syntax
as Hugo allows me to just
link to its documentation.
python-slugify
I found python-slugify so I did not
bother writing a slug-generating function.
As a side effect, now things works better than I would even have thought to
implement, including transliteration of non-ascii characters:
$ ./ssite new example --noedit --title "Cos parl Enrico"
/enrico-dev/staticsite/example/site/blog/2016/cosi-parlo-enrico.md
ssite serve
which monitors the file system and autoreloads when
content changes and renders everything on the fly, took about an hour. Most of
that hour went into implementing rendering pages on demand.
Then I discovered that it autoreloads even when I edit staticsite
's source
code.
Then I discovered that it communicates with the browser and even automatically
triggers a page refresh.
I can keep vim
on half my screen and a browser in the other half, and I get
live preview for free every time I save, without ever leaving the editor.
Bootstrap
I already use Bootstrap at work, so creating the
default theme templates with it took about 10 minutes.
This morning I tried looking at my website using my mobile phone, and I
pleasantly saw it automatically turning into a working mobile version of
itself.
Pygments
Python-Markdown uses Pygments for syntax highlighting,
and it can be themed just by loading a .css
.
So, without me really doing anything, even staticsite
's syntax highligthing
is themable, and there's even a nice page with a list of themes to choose
from.
Everything else...
Command line parsing? Straight argparse.
Logging? python's logging support.
Copying static resource files? shutil.copy2.
Parsing dates? dateutil.parser.
Timing execution? time.perf_counter.
Timezone handling? pytz.
Building the command to run an editor? string.format.
Matching site pages? fnmatch.translate.
...and then some.
If I ever decide to implement incremental rendering, how do I implement
tracking which source files have changed?
Well, for example, how about just asking git?
Product | Postbooks | Tryton | GnuCash | LedgerSMB | HomeBank | Skrooge | KMyMoney | BG Financas | Grisbi |
---|---|---|---|---|---|---|---|---|---|
GUI | Y | Y | Y | N | Y | Y | Y | Y | Y |
Web UI | Y | Y | N | Y | N | N | N | N | N |
Multi-user | Y | Y | N | Y | N | N | N | N | Y |
File storage | N | Y | Y | N | Y | Y | Y | N | N |
SQL storage | Y | Y | Y | Y | N | N | Y | Y | Y |
Multi-currency | Y | Y | Y | Y | N | Y | Y | Y | |
A/R | Y | Y | Y | Y | N | Y | Y | Y | |
A/P | Y | Y | Y | Y | N | Y | Y | Y | |
VAT/GST | Y | Y | Y | Y | N | N | Y | Y | |
Inventory | Y | Y | N | Y | N | N | N | ||
Linux | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Windows | Y | Y | |||||||
Mac OS | Y | Y | |||||||
Technology | C++, JavaScript, Node | Python | C | Perl | C | Java | |||
License | CPAL | GPL3 | GPL2 | GPL2 |
directhex@marceline:~$ mono --version
Mono JIT compiler version 4.3.0 (Nightly 4.3.0.21/88d2b9d Thu May 28 10:54:32 UTC 2015)
For the 2015 International Day Against DRM, I wrote a short essay on DRM for streaming services posted on the Defective by Design website. I m republishing it here. Between 2003 and 2009, most music purchased through Apple s iTunes store was locked using Apple s FairPlay digital restrictions management (DRM) software, which is designed to prevent users from copying music they purchased. Apple did not seem particularly concerned by the fact that FairPlay was never effective at stopping unauthorized distribution and was easily removed with publicly available tools. After all, FairPlay was effective at preventing most users from playing their purchased music on devices that were not made by Apple. No user ever requested FairPlay. Apple did not build the system because music buyers complained that CDs purchased from Sony would play on Panasonic players or that discs could be played on an unlimited number of devices (FairPlay allowed five). Like all DRM systems, FairPlay was forced on users by a recording industry paranoid about file sharing and, perhaps more importantly, by technology companies like Apple, who were eager to control the digital infrastructure of music distribution and consumption. In 2007, Apple began charging users 30 percent extra for music files not processed with FairPlay. In 2009, after lawsuits were filed in Europe and the US, and after several years of protests, Apple capitulated to their customers complaints and removed DRM from the vast majority of the iTunes music catalog. Fundamentally, DRM for downloaded music failed because it is what I ve called an antifeature. Like features, antifeatures are functionality created at enormous cost to technology developers. That said, unlike features which users clamor to pay extra for, users pay to have antifeatures removed. You can think of antifeatures as a technological mob protection racket. Apple charges more for music without DRM and independent music distributors often use DRM-free as a primary selling point for their products. Unfortunately, after being defeated a half-decade ago, DRM for digital music is becoming the norm again through the growth of music streaming services like Pandora and Spotify, which nearly all use DRM. Impressed by the convenience of these services, many people have forgotten the lessons we learned in the fight against FairPlay. Once again, the justification for DRM is both familiar and similarly disingenuous. Although the stated goal is still to prevent unauthorized copying, tools for stripping DRM from services continue to be widely available. Of course, the very need for DRM on these services is reduced because users don t normally store copies of music and because the same music is now available for download without DRM on services like iTunes. We should remember that, like ten years ago, the real effect of DRM is to allow technology companies to capture value by creating dependence in their customers and by blocking innovation and competition. For example, DRM in streaming services blocks third-party apps from playing music from services, just as FairPlay ensured that iTunes music would only play on Apple devices. DRM in streaming services means that listening to music requires one to use special proprietary clients. For example, even with a premium account, a subscriber cannot listen to music from their catalog using an alternative or modified music player. It means that their television, car, or mobile device manufacturer must cut deals with their service to allow each paying customer to play the catalog they have subscribed to. Although streaming services are able to capture and control value more effectively, this comes at the cost of reduced freedom, choice, and flexibility for users and at higher prices paid by subscribers. A decade ago, arguments against DRM for downloaded music focused on the claim that users should have control over the music they purchase. Although these arguments may not seem to apply to subscription services, it is worth remembering that DRM is fundamentally a problem because it means that we do not have control of the technology we use to play our music, and because the firms aiming to control us are using DRM to push antifeatures, raise prices, and block innovation. In all of these senses, DRM in streaming services is exactly as bad as FairPlay, and we should continue to demand better.
.dts
into an unpacked kernel tree and then running make
dtbs
.
Once this works, you need to compile the w1-gpio
kernel module,
since Debian hasn't yet enabled that. Run make menuconfig
, find it
under "Device drivers", "1-wire", "1-wire bus master", build it as a
module. I then had to build a full kernel to get the symversions
right, then build the modules. I think there is or should be an
easier way to do that, but as I cross-built it on a fast AMD64
machine, I didn't investigate too much.
Insmod-ing w1-gpio
then works, but for me, it failed to detect any
sensors. Reading the data sheet, it looked like a pull-up resistor on
the data line was needed. I had enabled the internal pull-up, but
apparently that wasn't enough, so I added a 4.7kOhm resistor between
pin 3 (VDD_3V3
) on P9 and pin (GPIO_45
) on P8. With that in
place, my sensors showed up in /sys/bus/w1/devices
and you can read
the values using cat
.
In my case, I wanted the data to go into collectd and then to
graphite. I first tried using an Exec plugin, but never got it to
work properly. Using a [python plugin] worked much better and my
graphite installation is now showing me temperatures.
Now I just need to add more probes around the house.
The most useful references were
Next.